comparison process, i.e., the alignment based approach and the

t-free approach.

lignment-based multiple sequence comparison is implemented in

different methods. The most typical methods include the

ve method, the iterative method and the consensus method. The

ve method is a heuristic approach [Feng and Doolittle, 1987;

et al., 2018] and has had a wide range of applications [Loytynoja

dman, 2005; Deorowicz, et al., 2016; Ayad and Pissis, 2017;

et al., 2018; Rubio-Largo, et al., 2018]. Among them, the

ve method is more effective and efficient.

asic principle of the progressive method is to align sequences

ly from the most related pairs to the least related pairs. First, a

alignment is done for all pairs using a fast non-alignment or

t-free sequence comparison method. Based on the these initial

on results, a hierarchical tree is constructed. Afterwards, the most

air of sequences is found and are aligned using a homology

t approach. The pairs of sequences are progressively added to an

t model, which is expressed as a tree.

clustal series packages and the msa package are such

ms. The msa is a Bioconductor package, in which clustal

gnment approaches are implemented. The main support package

is Biostrings. Suppose following sequences are required to

d,

GATGTATGGACCCG

GATGTATGGACCCG

GATGTATCCACCCG

CATGTATGGACCCG

CCAATATCGCTTCT

ollowing R code employed the Smith-Waterman algorithm to

se five sequences pair-wisely. Because each pair of sequences

uired to be aligned, a dual-for loop structure was employed. The

of this code was a similarity matrix score between all pairs of

s.